On Enhancing Data Utility in K-anonymization for Data without Hierarchical Taxonomies
نویسندگان
چکیده
K-anonymity is the model that is widely used to protect the privacy of individuals in publishing microdata. It could be defined as clustering with constrain of minimum k tuples in each group. K-anonymity cuts down the linking confidence between sensitive information and specific individual by the ration of 1/k. However, the accuracy of the data in k-anonymous dataset decreases due to information loss. Moreover, most of the current approaches are for numerical attributes or in case of categorical attributes they require extra information such as attribute hierarchical taxonomies which often do not exist. In this paper we propose a new model, based on clustering, defining the distance between tuples including numerical and categorical attributes which does not require extra information and present the SpatialDistance (SD) heuristic algorithm. Comparisons of experimental results on real datasets between SD algorithm and existing well-known algorithms show that SD performs the best and offers much higher data utility and reduces the information loss significantly.
منابع مشابه
An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملAnonymizing classification data using rough set theory
Identity disclosure is one of the most serious privacy concerns in many data mining applications. A wellknown privacy model for protecting identity disclosure is k-anonymity. The main goal of anonymizing classification data is to protect individual privacy while maintaining the utility of the data in building classification models. In this paper, we present an approach based on rough sets for m...
متن کاملDetecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes
With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...
متن کاملEnhancing the Utility of Anonymized Data by Improving the Quality of Generalization Hierarchies
The dissemination of textual personal information has become an important driver of innovation. However, due to the possible content of sensitive information, this data must be anonymized. A commonly-used technique to anonymize data is generalization. Nevertheless, its effectiveness can be hampered by the Value Generalization Hierarchies (VGHs) used as poorly-specified VGHs can decrease the use...
متن کاملEnhancing the Utility of Generalization for Privacy Preserving Re-publication of Dynamic Datasets
Anonymized publication on static micro data can be achieved with heavy information loss by Generalization. An enhanced utility of Generalization known as Angelization produces the same level of anonymization but with minimal information loss. In reality, there may be a need to publish another version of micro data, after insertions and deletions. Anonymization is applicable to any generalizatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013